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Abstract 

Sentiment classification is widely used for product reviews 
and in online social media such as forums, Twitter, and blogs. 
However, the problem of classifying the sentiment of user 
comments on news sites has not been addressed yet. News 
sites cover a wide range of domains including politics, sports, 
technology, and entertainment, in contrast to other online so¬ 
cial sites such as forums and review sites, which are spe¬ 
cific to a particular domain. A user associated with a news 
site is likely to post comments on diverse topics (e.g., poli¬ 
tics, smartphones, and sports) or diverse entities (e.g., Obama, 
iPhone, or Google). Classifying the sentiment of users tied to 
various entities may help obtain a holistic view of their per¬ 
sonality, which could be useful in applications such as online 
advertising, content personalization, and political campaign 
planning. In this paper, we formulate the problem of entity- 
specific sentiment classification of comments posted on news 
articles in Yahoo News and propose novel features that are 
specific to news comments. Experimental results show that 
our models outperform state-of-the-art baselines. 

Introduction 

Online news aggregator sites such as Yahoo News are a 
place for users to get in touch with developments across var¬ 
ious domains. In addition to reading news articles, users post 
comments giving their opinions/sentiments about the topics 
or entities discussed in the news articles, while interacting 
(agreeing or disagreeing) with other users. This has resulted 
in vast amounts of User Generated Content in the form of 
user comments. An interesting characteristic of news sites 
is that they cover a wide range of domains such as politics, 
sports, technology, and entertainment, in contrast to other 
online social sites, including forums (e.g., UbuntuForums 
and TripAdvisor) and review sites (e.g., dpreview.com for 
digital cameras and notebookreview.com for laptops), which 
are specific to a particular domain. Hence, the activity of a 
user in terms of posting comments is potentially much more 
diverse in news sites as compared to other social platforms. 

Although it is not uncommon for users to make general 
comments/statements on various topics or to comment on 
unrelated entities that they like or dislike, in many cases, 
comments on a news article contain the sentiments of users 
tied to specific entities in the article (e.g., Obama or An¬ 
droid). Classifying the sentiments of a particular user on di¬ 
verse entities may help obtain a holistic view of their per- 


sonalit\£] For example, the sentiments of a user’s comments 
on news articles tied to specific entities related to politics, 
smartphones and online retail may help infer her political 
orientation, preference for a particular mobile operating sys¬ 
tem (Android vs. iOS) and liking of a particular online re¬ 
tailer (Walmart vs. Target). User sentiments across articles 
on an entity (e.g., iPhone) can also be followed to determine 
how sentiments evolve or change over time, and what fac¬ 
tors can cause the sentiment change. Analyzing the senti¬ 
ment of these user comments can help understand the user 
better which, in turn, can be used to provide greater person¬ 
alization and improve serving targeted ads to those users. 

However, despite the evidence of strong value in analyz¬ 
ing the sentiment of users tied to specific entities, there have 
not been any reported works on this problem. The problem 
of identifying the sentiment polarity of these comments re¬ 
mains inherently difficult due to several main challenges, in¬ 
cluding irrelevant entities and implicit sentiment. 

Irrelevant entities: Comments often have entities that are 
not important with respect to sentiment analysis. Let us con¬ 
sider the following example: 

Example 1: Great! Foxnews poll: Obama +9; CNN poll: 
Obama +7; Reuters/Ipsos poll: Obama +9. 1 feel a landslide 
in the making. Gobama! Gobama! Gobama! 

In this example, the commenter has a positive sentiment 
for Obama and no sentiment for entities Foxnews, CNN, 
Reuters and Ipsos, which are irrelevant for sentiment anal¬ 
ysis. Unlike other domains such as product reviews where 
the sentiment is expressed towards a precisely defined tar¬ 
get (i.e., a product or its features), known beforehand, in our 
domain, the set of entities is not known a priori and covers 
a wide range of entities, with many of them being irrele¬ 
vant. In the example above, a traditional sentiment classi¬ 
fier would possibly identify the sentiment for Foxnews as 
positive due to its close proximity with the sentiment clue 
“Great!”, leading to inaccurate results. 

Implicit sentiment: Users often express sentiments im¬ 
plicitly in their comments by using ironies, analogies and 
rhetoric, making it hard to detect the sentiment towards 
entities (|Gonzalez-Ibanez, Muresan, and Wacholder 201 1} 
Utsumi 2000| . Let us consider the following examples: 

'in adherence to Yahoo’s privacy policy, all user activity is 
anonymized and the actual user’s identity is unknown to us. 









Example 2: I’ve heard that Hillary Clinton modeled herself 
after Nurse Ratched. 

Example 3: Who on earth would even buy Facebook stock? 

The first example has a negative sentiment about 
Hillary Clinton expressed through the analogy with “Nurse 
Ratched”, who is a negative fictional character. The sec¬ 
ond example is a rhetorical question expressing a nega¬ 
tive sentiment about Facebook. Typical sentiment classifi¬ 
cation approaches would label these examples as neutral 
due to the lack of sentiment clues (|Ding, Liu, and Yu 2008 


Qiu et al. 2011 1 |Zhang et al. 2011 1 Meng et al. 2012[>. 

Against this background, one question that can be raised 
is: Can we design techniques to effectively identify and filter 
out irrelevant entities in news comments and further perform 
accurate sentiment classification of entities for which a sen¬ 
timent is expressed? The research that we describe in this 
paper addresses specifically this question. 

Contributions. We address the problem of entity-specific 
sentiment analysis. More precisely, we formulate the prob¬ 
lem as a two-stage binary classification. First, we identify 
entities that are relevant with respect to sentiment analysis, 
while filtering out irrelevant entities. Second, we classify the 
sentiment expressed towards relevant entities as positive or 
negative. Although there are several works on analyzing sen¬ 
timents of news articles, the current problem is significantly 
different (as detailed in Section 2). To the best of our knowl¬ 
edge, there are no reported works on this problem. The con¬ 
tributions of our work are as follows: 

1. We propose an approach for context extraction of enti¬ 
ties discussed in news comments and show that it substan¬ 
tially improves sentiment classification. 

2. We design novel features for both classification tasks 
above. Specifically, we design: (1) non-lexical features for 
identifying relevant entities and show that these features are 
more informative than the lexicon-based features and the 
“bag-of-words” used in previous works on subjectivity anal¬ 
ysis; (2) comment-specific features for sentiment classifica¬ 
tion of entities in comments. 

3. We show experimentally that our sentiment classi¬ 
fiers trained using the proposed features extracted from the 
entity-specific contexts outperform several state-of-the-art 
approaches to sentiment classification. 

Related Work 

Sentiment analysis (SA) is widely researched due to its im¬ 
portant applications in mining, analyzing and summarizing 
user opinions in online product reviews ( |Hu and Liu 2 004; 
Ly et al. 201 1| Ding, Liu, and Yu 2008 1 . Here, we review 
some of the relevant sentiment analysis works. 

Entity-independent SA (EISA): EISA deals with identi¬ 
fying sentiment of a text without linking the sentiment to an 
entity for which it is expressed. EISA is mainly researched in 
the domain of product reviews, where a review is assumed to 
contain sentiments about a particular product and, hence, the 
linking is not required (Pang, Lee, and Vaithyanathan 2002 


Pang and Lee 2004] McDonald et al. 2007; Wan 2009 
Li et al. 2012[>. Pang et al. (Pang, Lee, and Vaithyanathan 


and POS tags, for sentiment analysis of movie reviews. In 
their later work, they improve the sentiment classification by 
considering only the subjective sentences and applying po¬ 
larity classifiers (developed in their previous work) on those 
sentences ( |Pang and Lee 2004} . Wan et al. ( |Wan 2009) use 
co-training for sentiment classification of Chinese product 
reviews. They use machine translation to obtain the training 
data from labeled English reviews. For a Chinese review, its 
Chinese features and the translated English features repre¬ 
sent the two independent views that are used in co-training. 

Entity-dependent SA (EDS A): EDS A, on the other 
hand, links sentiment to its target entity (|Ding, Liu, and Yu[ 
2008| Nasukawa and Yi 2003| Engonopoulos et al. 2011 [ 


Zhang et al. 201l] ~ Meng et al. 2012) . Ding et al. ( jDing, 


Liu, and Yu 2008) performed EDSA on product reviews 


ing a lexicon-based approach. For an entity, they calculated 
its sentiment score by adding sentiment orientation (±1) of 
opinion words co-occurring with the entity in a sentence. 
Meng et al. ( Meng et al. 2012j > used a similar approach for 
sentiment classification of tweets and determine sentiment 
orientation by aggregating sentiments of opinion words. In 
contrast, we use supervised learning models built using sev¬ 
eral newly designed features in addition to lexicon-based 
features. The lexicon-based approach is one of our baselines. 

SA in News Sites: There are several works on sentiment 
classification of news article s fGodbole, Srinivasaiah, and] 
[Skiena 2007[ |Devitt and Ahmad 2007] i. However, sentiment 
classification of news comments is a much more difficult 
task compared to that of news articles since, unlike news ar¬ 
ticles, news comments are short, noisy, incoherent, and com¬ 
prise of very informal writing styles. We found a few works 
focusing on news comments for analyzing their quality of 
discourse ( (Diakopoulos and Naaman 2011[ ) and diversifying 
them for presenting a comprehensive view of news articles 


to the readers (Giannopoulos et al. 2012). However, these 


works are different from ours in nature. 

Problem Characterization 

Sentiment classification in online social sites faces many 
challenges such as dealing with unstructured text and noisy 


user input, and mapping sentiment to objects or entities (Liu 
2011} . Beyond these, sentiment classification of news com- 


2002 1 used supervised machine learning algorithms trained 
on lexical and syntactic features such as unigrams, bigrams 


ments brings additional challenges, i.e., a variety of domains 
(e.g., politics, sports, and entertainment), lack of use of im¬ 
portant sentiment clues (e.g., no use of emoticons), and the 
use of rhetorical questions. These additional, less studied 
challenges give rise to the unique design of our model. 

The main tasks of sentiment classification of news com¬ 
ments are: (1) extracting entities from news comments, and 
(2) identifying users’ sentiments about the extracted entities. 
Although both tasks have their own particular challenges, 
the second task is central to our study. To extract entities 
from news comments, we use the Stanford Named Entity 
Recognizer (SNER). SNER typically identifies three types 
of entities: person, place, and organization. More precisely, 
our problem can be formulated as follows. 

Problem Formulation: Given a comment and an entity, 
classify the sentiment expressed in the comment about that 
entity as: positive, negative or neutral/irrelevant. 

To address this problem, we decompose it into two parts. 

































































First, we link the target entity with its sentiment context. 
Specifically, when multiple entities are present in a com¬ 
ment, each entity must be linked to its own context, i.e., the 
words/phrases in the comment that are related to the entity. 
This is necessary since entities in a comment may have dif¬ 
ferent sentiments or some entities may not have any senti¬ 
ment at all associated with them (as illustrated below). 
Example 5: In Ohio, voting for Romney who said he would 
let GM and Chrysler go bankrupt is like paying a guy to 
rebuild your house that he burned down. 

Here, the sentiment is negative for Romney. However, 
GM and Chrysler do not have any sentiment. 

Second, after entities are linked to their contexts, we iden¬ 
tify the sentiment for an entity to be positive, negative or 
neutral, based on the sentiment of its context. 

Extracting the Context of an Entity 

The context of an entity contains the words, phrases or sen¬ 
tences that refer to the entity. We use several heuristics to 
extract the contexts. Following are the three main modules 
of our context extraction algorithm: 

1. Preprocessing, where the number of entities in a com¬ 
ment is checked. For single entity comments, the entire com¬ 
ment is taken as the context for the entity. If a comment con¬ 
tains multiple entities, it is segmented into sentences and is 
given as input to the anaphora resolution module. 

2. Anaphora Resolution: We use a rule based approach 
to anaphora resolution. We check the type of entity: PER¬ 
SON (P) vs. NON-PERSON (NP) and assign sentences to 
the context of the entity if they have explicit mentions of 
that entity or compatible anaphoric references. For exam¬ 
ple, pronouns such as he, she, her, him can only be used to 
refer to a P entity, whereas they, their, them can be used to 
refer to both P and NP entities and it can only be used for 
NP entities. If a sentence does not have references to any 
entity, then it is added to the context of all the entities. Also, 
if a sentence has explicit mentions of multiple entities, then 
it is given as input to the local context extraction module. 

3. Local Context Extraction: If entities occur in clauses 
that are connected with “but” (in the sentence), then the re¬ 
spective clauses are returned as local contexts for the enti¬ 
ties. If the sentence contains a comparison between entities, 
then it is split at the comparative term (adjective or adverb), 
with the comparative term added to the left part, and the two 
parts are returned as local contexts for the respective enti¬ 
ties. If none of the two conditions is satisfied, then a window 
of ±3 tokens around entities is taken as their local context. 
Identifying the Sentiment of Contexts 

After obtaining the contexts of entities, we classify their sen¬ 
timent into positive, negative or neutral sentiment classes. 
We model the task of identifying sentiment as two step clas¬ 
sification. In the first step, we classify the context of an en¬ 
tity into polar versus neutral sentiment classes. Next, we 
classify the polar entities into positive or negative sentiment 
classes. Next, we describe the features used in our classifi¬ 
cation models and our reasoning behind using them. 
Neutral vs. Polar Classification As already discussed, 
comments posted on news sites contain entities that are irrel¬ 
evant with respect to sentiment analysis (see Example 1 in 
Section]]). These entities have no sentiment associated with 


them and are filtered out before conducting sentiment clas¬ 
sification of comments. We address this problem by classi¬ 
fying entities as polar vs. neutral. Irrelevant entities are clas¬ 
sified as neutral. Generally, content features and lexicon fea¬ 
tures form the basis of polar vs. neutral classification. How¬ 
ever, in our data, we find some other interesting properties 
(specific to entities) that can be very helpful in identifying 
neutral and polar entities. For example, an entity that is a 
subject or direct object (of the subject) in a comment is more 
likely to be polar than an entity that is a prepositional object. 
Also, an entity of the type person is more likely to be polar 
than an entity that is of non-person type. Let us consider the 
following examples: 

Example 9: Bush didn’t blame anyone for trashing the 
White House, the 2001 recession, or for the 3 major attacks 
on America. 


Example 10: Obama stole 716 billion dollars we paid into 
medicare. 

In Example 9, Bush is the subject. White House is the 
direct object and America is the prepositional object. In Ex¬ 
ample 10, Obama is the subject. Medicare is the preposi¬ 
tional object. As we see, Obama and Bush are polar, whereas 
America, White House and Medicare are neutral. 

Based on this reasoning, we extract the following features 
for all entities in a comment: 

IsPerson: If the entity is of person type (1 if yes, 0 oth¬ 
erwise). To compute this feature, we look at the entity type 
output by SNER. 

IsSubjObj: If the entity is the subject, direct object, 
prepositional object or none of the three. (3 if subject, 2 if 
direct object, 1 if prepositional object, 0 otherwise). To com¬ 
pute this feature, we check if the entity has the following 
dependencies in the dependency tree: nsubj and nsubjpass 
(nominal subject and nominal subjective passive resp.), dobj 
(direct object) and pobj (prepositional object). 

HasClues: If there are any polarity clues in the context of 
the entity, as detailed in Sectioned if yes, 0 otherwise). 

SentiPos: This feature is calculated from the positive sen¬ 


timent score given by the SentiStrength algorithm (Thelwall, 
|Buckley, and Paltoglou 20l2| (0 if the score is 1, 1 other¬ 
wise) (we explain the scores output by SentiStrength in the 
following section). 

SentiNeg: This feature is calculated from the negative 


score given by the SentiStrength algorithm (Thelwall, Buck- 
|ley, and Paltoglou 2012)) (0 if the score is -1, -1 otherwise). 


Positive vs. Negative Classification After obtaining the 
polar entities, we classify the sentiment about those entities 
into positive or negative sentiment classes. We use the fol¬ 
lowing features for the positive-negative classification. 

(a) Polarity Clues: Polarity clues are the words, phrases, 
or symbols used to express polarity of opinions/emotions. 


They have been used extensively in sentiment analysis (Hu 
and Liu 2004| Turney 2002 1 . We use the subjectivity lexi 


con from MPQA corpus developed by Wiebe et al. (Stoy- 


|anov. Cardie, and Wiebe 2005 ) to get the polarity clues. The 
lexicon contains 2006 positive clues, 4713 negative clues 
and 572 neutral subjectivity clues. We extract three features 
NumPos, NumNeg, and PosVsNeg from the context of an 















entity. NumPos and NumNeg are the number of positive and 
negative polarity clues in the context, respectively. PosVs- 
Neg is the number of positive divided by the number of neg¬ 
ative polarity clues, i.e., (NumPos+l)/(NumNeg+l). 

The following rules are used to count the polarity clues: 

Rule 1: Negation: If a polarity clue is connected to a 
negation word (i.e., they co-occur in a window of 3 tokens), 
we reverse its polarity. If a neutral subjectivity clue is con¬ 
nected to a negation word, then its polarity is taken as neg¬ 
ative. For example, believe is a subjectivity clue with prior 
polarity neutral, but if used with a negation (e.g. I do not be¬ 
lieve or I cannot believe) expresses negative sentiment. We 
use a list of 50 negation words. 

Rule 2: Quotes: Users often put polarity clues in quotes 
or in a quoted phrase to mean entirely opposite sentiment as 
compared to the sentiment expressed by the clue. If a polar¬ 
ity clue is in a quoted phrase then we reverse its polarity. Let 
us consider this example. 

Example 11: The Republican party also faces a steep climb 
with the “sane people” demographic. 

Here, the clue sane is in a quoted phrase sane people. The 
prior polarity of sane is positive. However, here, it is used to 
express negative sentiment about the Republican party. 

Rule 3: “but” rule: Usually, sentiment expressed in 
clauses connected with “but” have opposite polarities. We 
take into account this property, while aggregating polarity 
clues for the entities. If clauses containing two entities are 
connected with “but” and there are explicit polarity clues in 
the context of only one of the entities, then we increase the 
count of the clue of opposite polarity for the other entity. 
Example 12: Read how Bush tried to control the financial 
situation with new regulations, but democrats blocked him. 
Democrats are pathetic, greedy liars. 

Here, Bush and Democrats occur in clauses connected 
with “but” and have opposite sentiment. For democrats, 
there are explicit negative clues (pathetic, greedy) but we 
do not have explicit polarity clues for Bush. In this case, we 
take the value of NumPos feature for Bush as 1. 

Rule 4: Comparatives: If two entities are present in a 
comparative clause and one of the entities does not have 
an explicit polarity clue (in its context), then for that en¬ 
tity we increment the number of the opposite polarity clue. 
We identify two most common types of comparatives: ad¬ 
jectival comparatives and adverbial comparatives. We look 
for JJR and RBR part-of-speech tags between entities to 
identify comparative adjectives and comparative adverbs, re¬ 
spectively. Let us consider this example: 

Example 13: The Samsung galaxy s’ are way better than all 
the mobile products apple puts out. 

Here, Apple has a negative sentiment but does not have 
any explicit polarity clue in its context. Using the rule, we 
take the value of NumNegfeature for Apple as 1. 

(b) Punctuation Marks: It is a common practice in on¬ 
line social media to use punctuation marks to express senti¬ 
ments. We look for the presence of two punctuation marks: 
question and exclamation marks in the context of an entity. 
We calculate two punctuation features for a context: IsQues- 
tion (presence or absence of a question mark), IsExclam 
(presence or absence of an exclamation mark). 


(c) Sentiment Strength: These features capture the 
strength of the sentiments expressed in comments. We used 
the SentiStrength algorithm (Thelwall, Buckley, and Pal- 


toglou 2012) to compute these features. The algorithm is 
specifically designed to calculate sentiment strength of short 
informal texts in online social media. For a piece of text, 
the algorithm computes two integral scores, one in the range 
of +1 (neutral) to +5 (highly positive) that is expressive of 
the positive sentiment strength of the text and another in the 
range -1 (neutral) to -5 (highly negative) for negative sen¬ 
timent strength. A score of +1 and -1 for a text means that 
the text is neutral or has no sentiment. Using SentiStrength, 
we compute three features: PosStrength (positive sentiment 
score), NegStrength (negative sentiment score) and PosVs- 
NegStrength (PosStrength divided by NegStrength). 

(d) Comment-specific features: These features capture 
clues that are specific to news comments. Users often use 
rhetoric in their comments to express a negative sentiment 
about an entity. They begin their comments by writing 
rhetorical questions and/or asking rhetorical questions about 
entities. Rhetorical questions are those that are not asked for 
the purpose of obtaining answers or information, but rather 
to make a point effectively Examples of rhetorical ques¬ 
tions are: Where is my vote?. Can’t you do anything right? 
Let us consider the following examples: 

Example 14: PLANS? What Plans? Obama has no plans for 
his second term. 

Example 15: So now the Associated Press has to correct 
their own corrections? 

These examples express implicit negative sentiment about 
Obama and Associated Press, without using explicit neg¬ 
ative polarity clues. To capture these rhetorics, we design 
two binary features: IsFirstQues and IsEnQues. IsFirstQues 
checks whether the first sentence in the context of an entity 
is question or not. IsEnQues checks if an entity is present 
in a question sentence. To identify question sentences, we 
check for the presence of 5W1H words and question marks. 

Experiments 


Data Description 

Since there is no annotated dataset for sentiment classifica¬ 
tion of online news comments, we prepared our own dataset. 
We randomly sampled comments for annotation that sat¬ 
isfied certain constraints to ensure quality and diversity of 
the dataset. We, first, marked all the comments with the en¬ 
tities present in them and ranked the entities according to 
their comment frequencies. From the ranked list, we selected 
43 entities to consider. These entities covered areas such 
as politics (e.g., Obama, Romney), software (e.g., Google, 
Microsoft), online retail (e.g., Walmart, Ebay), hardware 
(e.g., Samsung, Apple), and insurance (e.g.. Medicare, Oba- 
macare), among others. The entities were selected based on 
their popularity as well as their relevance from the point of 
view of user targeting. Figure [I] shows a “word cloud” of 
the 43 entities. The larger the entity, the more frequent it 
is in the news comments. As we see, Walmart has a much 
smaller comment frequency compared with other entities 


2 http://en. wikipedia.org/wiki/Rhetorical_question 







such as Barack Obama, however, it is important due to its 
commercial nature and relevance to ad targeting. 
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Figure 1: Word cloud of the 43 entities used in sampling. 

We sampled 526 comments such that all the comments 
have at least one of the 43 entities and all the entities have 
approximately equal number of sampled comments. We then 
marked the three most important entities in each comment, 
obtaining 941 instances. For a comment, the number of in¬ 
stances is equal to the number of entities marked for that 
comment. Each instance was annotated by two annotators. 
For each instance, the annotators were asked to identify sen¬ 
timents expressed in that instance (as negative, neutral or 
positive). The agreement between the annotators was 90%. 
For the remaining 10%, a third annotator was asked to select 
between the two annotations of the two original annotators. 
Given this annotation scheme, we obtained 632 negative in¬ 
stances, 151 positive instances and 158 neutral instances. 
Also, 41 comments are neutral, i.e., all the entities present 
in them have neutral sentiment, 184 comments contain polar 
as well as neutral entities and 301 comments have only po¬ 
lar entities. We call the comments that have both polar and 
neutral entities as pseudo-polar comments. 

Experimental Setting 

We conducted sentiment classification experiments using 
various supervised machine learning algorithms imple¬ 
mented in the Weka data mining toolkit ( |Hall et al. 2009) . 
For neutral-polar classification. Logistic Regression gave the 
best performance, whereas for the positive-negative classi¬ 
fication, Naive Bayes outperformed other supervised meth¬ 
ods. To evaluate the performance of our classifiers, we report 
precision, recall and F-l score, all macro averaged across 10 
folds in a cross validation setting. 

For neutral-polar classification, we use neutral and 
pseudo-polar comments. After segmenting comments into 
contexts of entities, we obtain a total of 345 instances (158 
neutral and 187 polar) from 225 comments. As explained 
in Section [| an instance for classification is a context of an 
entity present in a comment. Since a comment may have 
multiple entities and, hence, multiple contexts, we can ob¬ 
tain more instances than the total number of comments. For 
positive-negative classification, we use polar and pseudo- 
polar comments (a total of 485 comments). Neutral enti¬ 
ties from the pseudo-polar comments are not considered in 
positive-negative classification. 


Baselines: We compare our sentiment classifiers with the 
following three baselines: 

1. Bag-of-words and POS tags ( |Jiang et al. 20 11 [ ; Pang 


Lee, and Vaithyanathan 2002 McDonald et al. 2007) : We 
use the words in the context of an entity and the part-of- 
speech tags of those words as features for classification and 
experiment with two settings: 1) BoW, in which only word 
frequencies are used as features, 2) BoW+POS, in which 
both word frequencies and their POS tags are used as fea¬ 
tures. We use Multinomial Naive Bayes for these models. 

2. SentiStrength: SentiStrength is a state-of-the-art tool 
for sentiment analysis of short informal texts posted on on¬ 
line social media. We use the following two settings for turn¬ 
ing SentiStrength into a sentiment classifier: 


(a) SentiStrength scores as features: We use the two 
scores (positive and negative) output by SentiStrength 
as features for sentiment classification. 


(b) SentiStrength scores as rules: We use the two scores 
directly as rules for making an inference about the sen¬ 
timent of a context. Lor neutral-polar classification, a 
score of +1 and -1 implies that the text is neutral, and 
polar otherwise. Lor the positive-negative classification, 
a context is positive if its positive sentiment score is 
greater than its negative sentiment score and similarly 
for inferring a negative sentiment. Lor example, a score 
of +3 and -2 implies positive polarity and a score of 
+2 and -3 implies negative polarity. If both scores are 
equal for a context, we randomly assign the context to 
one class or the other. 


3. LexiconRuleBased ( [Ding, Liu, and Yu 2008 [ Mcng et 
jal. 201 2} : We compute the sentiment for an entity e in a cor 
ment C by calculating the following score: 


score(e, C ) = 


E Wi-SO 
die, Wi) 

sGC \wi:wi€sr\Wi£L v , 


(i) 


where s is a sentence in C, Wi is a polarity word in s, L is 
the polarity lexicon, Wi.SO is the sentiment orientation of 
Wi (1 if positive, -1 if negative) and d(e, Wi) is the distance 
between the polarity word Wi and e in s. The denomina¬ 
tor down-weights the sentiment orientation of polarity words 
that are far from the entity. The sentiment is positive if the 
score is greater than zero, negative if the score is less than 
zero and neutral otherwise. Lor positive-negative classifica¬ 
tion, if we obtain a zero score, we assign the entity randomly 
to the positive or the negative class. 

4. Naive context extraction: To evaluate our context ex¬ 
traction algorithm, we compare it against a simple method 
for extracting entity contexts. Lor this method, we extract 
entity contexts using a simple scheme. We add the entire 
sentence to the contexts of all the entities present in it. If a 
sentence does not contain any entity, then we add it to the 
context of all the entities in the comment. All other classifi¬ 
cation settings (features and classifiers) remain same. 


Classification Results 


Neutral-polar Classification Table[I]shows the results of 
neutral-polar classification. The first five rows show the re¬ 
sults of the baseline models, whereas the subsequent four 



















Model 

Pr. 

Re. 

F-l 

lexiconRuleBased 

0.515 

0.420 

0.463 

SentiStrength (rule) 

0.527 

0.502 

0.466 

SentiStrength (features) 

0.506 

0.510 

0.507 

BoW 

0.553 

0.557 

0.553 

BoW+POS 

0.565 

0.565 

0.565 

NaiveContextExhaction 

0.646 

0.645 

0.645 

Proposed model—IsPerson 

0.575 

0.574 

0.574 

Proposed model—IsSubjObj 

0.680 

0.643 

0.636 

Proposed model —HasClues 

0.667 

0.664 

0.664 

Proposed model—SentiStrnth 

0.667 

0.664 

0.664 

Proposed model 

0.671 

0.670 

0.670 


Model 

Pr. 

Re. 

F-l 

lexiconRuleB ased 

0.227 

0.432 

0.298 

SentiStrength(rule) 

0.44 

0.462 

0.452 

SentiStrength) feature) 

0.599 

0.77 

0.674 

BoW 

0.637 

0.687 

0.659 

BoW+POS 

0.653 

0.693 

0.665 

N aiveContextExtraction 

0.472 

0.492 

0.491 

Propsed model—CSF 

0.673 

0.715 

0.687 

Proposed model 

0.678 

0.717 

0.700 


Table 2: Positive-negative classification results. 


Table 1: Neutral-polar Classification results. 


rows show the results of models built by removing only 
one feature at a time from the proposed model. The last 
row shows the result of the proposed model. As can be 
seen from the table, the proposed model outperforms all 
the baselines and using all the features gives the best per¬ 
formance with F-l score of 0.67. We see that the Lexicon- 
RuleBased method and SentiStrength (rule) are the worst 
performing models with F-l scores of 0.463 and 0.466, re¬ 
spectively, followed by SentiStrength (features) with an F- 
1 score of 507. This can be attributed to the fact that Sen¬ 
tiStrength is trained on online social media data, which 
is significantly different from comments data. For exam¬ 
ple, one of the features used by SentiStrength for detect¬ 
ing sentiment is the presence of emoticons, which are gen¬ 
erally not present in news comments. Similarly, we see 
that the BoW model performs the third worst with an F- 
1 score of 0.553. Adding part-of-speech tags to BoW im¬ 
proves the performance to an F-l score of 0.565. Note that 
BoWs generally perform better in other sentiment classifi¬ 


cation tasks in domains such as Twitter (Jiang et al. 2011 1 


and product reviews ([Pang, Lee, and Vaithyanathan 2002 

|McDonald et al. 2007[ ) compared with BoWs in our domain. 
A possible reason could be the presence of implicit senti¬ 
ment in the form of rhetorical questions, sarcasm, etc., where 
users do not use explicit sentiment words, and hence, there 
are less patterns of words and common POS tags that are 
generally used to express sentiment and subjectivity (e.g., 
adjectives, adverbs and common nouns) to leam the models. 
Also, we see that our method outperforms NaiveContextEx- 
traction. 

Next, we discuss the impact of removing different fea¬ 
tures from our model. We see that removing the IsPerson 
feature decreases the F-l score to 0.574 and when IsSub- 
jObj is removed the performance drops to 0.636. The re¬ 
moval of HasClues and SentiStrength features (sentiPos and 
sentiNeg) has a similar impact on the performance (however, 
not as big as the removal of IsPerson or IsSubjObj), resulting 
in an F-l score equal to 0.664, in both cases. We see that re¬ 
moving IsPerson feature has the highest negative impact on 
the performance, followed by IsSubjObj feature and Has¬ 
Clues and SentiStrength features. This observation is con¬ 
sistent with the feature ranking using Information Gain (IG) 


(Yang and Pedersen 1996) as output by Weka. The follow¬ 
ing is the IG-based feature ranking: IsPerson > IsSubjObj 

> SentiNeg > HasClues > SentiPos. The features on the 
right side of > have higher rank than those on the left side 
of >. We see that the two proposed non-lexical features, Is¬ 
Person and IsSubjObj, are more informative than HasClues 
and SentiStrength features that are based on lexical prop¬ 
erties of comments. This suggests that in comments, entity 
type (person or non-person ) and its grammatical role in the 
comment (subject, direct object or prepositional object) are 
highly informative clues/features for polarity. 

Positive-negative Classification Table[2]shows the results 
of positive-negative classification experiments. The first five 
rows present the results of the four baseline classifica¬ 
tion models, whereas the next two rows show the results 
of the proposed model without and with comment-specific 
features, denoted by Proposed model—CSF and Proposed 
model, respectively. As can be seen, the LexicalRuleBased 
method is the worst performing model in this setting with an 
F-l score of 0.298, followed by SentiStrength (rule), BoW 
and SentiStrength (features) with F-l scores of 0.452, 0.659 
and 0.674, respectively. POS tags improve the F-l score of 
BoW model from 0.659 to 0.665. The proposed model out¬ 
performs all the baselines, having an F-l score of 0.7. To 
see the effect of comment-specific features on the positive¬ 
negative classification, we experimented with the proposed 
model without the comment-specific features. We see that 
adding comment-specific features improves the F-l score of 
the model from 0.687 to 0.7. 

To analyze the importance of different features, we ranked 
them using Information Gain ( |Yang and Pedersen 1996[ > and 
obtained the following feature ranking: NumNeg > PosVs- 
Neg > NegStrnth > IsQuesMark > IsEnQues > PosStrnth 

> IsQuesFirst > IsExclaim > NumPos > PosVsNegStrnth. 
We see that features related to positive sentiment (PosStrnth 
and NumPos) are ranked lower than NumNeg and NegStrnth 
features. One potential reason for this is that users gener¬ 
ally express negative sentiments more explicitly than positive 
sentiments, and hence, the presence of significantly more 
negative patterns to learn as compared to the positive ones. 
Conclusion and Future Work 

In this paper, we studied the problem of identifying users’ 
sentiments towards individual entities referenced in com¬ 
ments on news articles. We identified several challenges to 
this problem and proposed solutions to address them. In par- 

























ticular, we designed an algorithm to extract the context of 
entities in comments, proposed novel non-lexical features 
for neutral-polar classification, and comment specific fea¬ 
tures for polarity classification. Our methods outperformed 
strong baselines for sentiment classification. Interesting di¬ 
rections for future work include: (1) using priors on users 
based on their comments on (particular or all) entities, e.g., 
a user could be pessimistic or cynical towards all entities; 
(2) training mixture of specialized classifiers for the domains 
covered by a news site, e.g., political, sports, technology, and 
entertainment. We believe generally people become more 
sarcastic when they discuss politics. 
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